Argumentative Zoning: Information Extraction from Scientific Text

نویسندگان

  • Simone Teufel
  • Vasilis Karaiskos
  • Anne Wilson
  • David McKelvie
چکیده

ing service in physics and the manufacturer of the INSPEC database, indexed 174,000 items in one year alone (1996), of which about 146,500 are journal articles. However, these already impressive numbers exclude less important journals, workshop proceedings, conference papers and non-English material. Indeed, the growth rate is probably exponential—Maron and Kuhns (1960) estimated that the indexed scientific material doubles in volume every 12 years. The masses of information the researcher is exposed to make it hard for her to find the needle in the haystack as it is impossible to skim-read even a portion of the potentially relevant material. The information access and search problem is particularly acute for researchers in interdisciplinary subject areas like computational linguistics or cognitive science, as they must in principle be aware of articles in a whole range of neighbouring fields, such as computer science, theoretical linguistics, psychology, philosophy and formal logic. Apart from keeping abreast of developments in scientific fields in general, more practical requirements emerge when researchers who are experienced in one scientific field start getting interested in a new scientific field, in which they have no prior knowledge. Their information needs have suddenly changed: Kircz (1991) states that such readers seek understanding instead of a firm, formal answer. The exact information need is not known beforehand; the questions they pose are not precise (Kircz’ example is the question “what are they doing in high-temperature super-conductivity?” (p. 357)). Belkin (1980) refers to their situation as an “anomalous knowledge state”. We think that researchers in a new field initially need answers to the following questions: What are the main problems and main approaches? Knowledge of a number of important concepts in the field needs to be acquired: the current problems and the standard methodologies in the field. For the main approaches, the researcher needs to know their strengths and weaknesses. The searcher also needs to gain an overview of the evaluation methodology and typical numerical results in the field. Which researchers and groups are connected with which concepts? Researchers’ names—and the institutions where they work—must be associated with seminal approaches and seminal papers. The searcher must determine schools of thought: clusters of people working together, sharing premises and building on each others work. 1.1. Information Foraging in Science 15 If researchers read a paper in a new field, they are particularly interested in the general approaches described, the relation to other work, and its conclusions, instead of specialist details (Kircz, 1991). Oddy et al. (1992) and Shum (1998) argue that what such readers particularly need is an embedding of the particular piece of work within a broader context and in relation to other works. The preferred information source at that stage of knowledge is an experienced colleague. Another standard technique for gaining a deeper overview of a field is to find a recent review article, to follow up the bibliographic links and to read however many of those papers one’s time permits. But sometimes neither of these useful aids is available, and a full-blown bibliographic search using an electronic document retrieval system is necessary, e.g. BIDS, FirstSearch or MEDLINE. This is typically done by a keyword search, where the keywords can be combined with Boolean operators. In most commercial bibliographic data bases, keyword search is still performed on document surrogates, rather than on the full text of the document, as the full text is not always available in electronic form. Typical document surrogates used in document retrieval environments are bibliographic information (i.e. title, authors, date of publication, journal name), a list of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accurate Argumentative Zoning with Maximum Entropy models

We present a maximum entropy classifier that significantly improves the accuracy of Argumentative Zoning in scientific literature. We examine the features used to achieve this result and experiment with Argumentative Zoning as a sequence tagging task, decoded with Viterbi using up to four previous classification decisions. The result is a 23% F-score increase on the Computational Linguistics co...

متن کامل

Robust Argumentative Zoning for Sensemaking in Scholarly Documents

We present an automated approach to classify sentences of scholarly work with respect to their rhetorical function. While previous work that achieves this task of argumentative zoning requires richly annotated input, our approach is robust to noise and can process raw text. Even in cases where the input has noise (as is obtained from optical character recognition or text extraction from PDF fil...

متن کامل

Automatic Critiquing of Novices’ Scientific Writing Using Argumentative Zoning

Scientific writing can be hard for novice writers, even in their own language. We present a system that applies Argumentative Zoning (AZ) (Teufel & Moens 2002), a method of determining argumentative structure in texts, to the task of advising novice writers on their writing. We address this task by automatically determining the rhetorical/argumentative status and the implicit author stance of a...

متن کامل

Argumentative analysis of the ACL Anthology (Analyse argumentative du corpus de l'ACL (ACL Anthology)) [in French]

This paper presents an application of Text Zoning to the ACL Anthology. Text Zoning is known to be useful to characterize the content of papers, especially in the scientific domain. We show that recent techniques based on weakly supervised learning obtain excellent results on the ACL Anthology. Although these kinds of techniques is known in the domain, it is the first time it is applied to the ...

متن کامل

Weakly supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine?

MOTIVATION Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the methods, results or conclusions of the study in question. Several approaches have been developed to identify such information in scientific journal articles. The best of these have yielded promising results and proved useful for biomedical text mini...

متن کامل

Towards Discipline-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics

Argumentative Zoning (AZ) is an analysis of the argumentative and rhetorical structure of a scientific paper. It has been shown to be reliably used by independent human coders, and has proven useful for various information access tasks. Annotation experiments have however so far been restricted to one discipline, computational linguistics (CL). Here, we present a more informative AZ scheme with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999